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Q Suppose we observe a geometrically ergodic semi-Markov process and have a parametric model for the transition distribution 
of the embedded Markov chain, for the conditional distribution of the inter-arrival times, or for both. The first two models 
for the process are semiparametric, and the parameters can be estimated by conditional maximum likelihood estimators. 
The third model for the process is parametric, and the parameter can be estimated by an unconditional maximum likelihood 
estimator. We determine heuristically the asymptotic distributions of these estimators and show that they are asymptotically 
efficient. If the parametric models are not correct, the (conditional) maximum likelihood estimators estimate the parameter 
that maximizes the Kullback— Leibler information. We show that they remain asymptotically efficient in a nonparamctric 
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7 ~~ 1 • 1. Introduction 

(T) ■ For i.i.d. observations, Daniels 0] and Huber [20l | show that the maximum likelihood estimator of a misspec- 
. ified parametric model estimates the parameter that maximizes the Kullback-Leibler (KL) information, 
7— I ' and determine its asymptotic distribution. Weaker conditions are given by Pollard [33] . For applications 
£^ ■ see also White (35| . Miiller [3(3], and Doksum, Ozeki, Kim and Neto [3]. Analogous results are obtained for 
parametric Markov chain models by Ogata [3l|, for parametric time series by Hosoya |1£ [and by Andrews 
and Pollard and for parametric diffusion models by McKeague [28| and Kutoyants 2j|. We refer also 
to the monograph of Kutoyants |26]. Applications to time series models in econometrics are studied by 



White [36| and Sin and White 34 1 , and in the monograph of White |3 
Greenwood and Wefelmeyer 



151 ] prove that the maximum likelihood estimator of a misspecified para- 
metric Markov chain model is efficient in a nonparametric sense. Related efficiency results for misspecified 
parametric time series are in Dahlhaus and Wefelmeyer [5J. Here we outline corresponding results for semi- 
Markov processes. We consider both parametric and semiparametric misspecified models. The arguments 
are heuristic; sufficient regularity conditions can be obtained as in the above references. 

Suppose we observe a semi-Markov process Zt, t > 0, with values in an arbitrary measurable space E, 
on a time interval < t < n. Let (Xq, To), (Xi,T\), . . . denote the embedded Markov renewal process. Its 
transition distribution factors as 



S(x, dy, du) = Q <S> R(x, dy, du) = Q(x, dy)R(x, y, du), 
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where Q(x, dy) is the transition distribution of the embedded Markov chain Xo, Xi, . . ., and R(x, y, du) is 
the conditional distribution of the inter-arrival time Uj = Tj — Tj-i given Xj—i = x and Xj = y. 

We assume that the embedded Markov chain is stationary. We write Pi(cte), P%(dx, dy) and P%{dx, dy, du) 
for the stationary laws of Xj—x, (Xj-±,Xj) and (Xj—x,Xj,Uj), respectively. Of course, P2 = Pi ® Q and 
P3 = Pi % R = Pi <8> Q ® R- Set N = max{j : Tj < n}. We note that studying a semi-Markov process 
is equivalent to studying the embedded Markov renewal process. The latter is a Markov chain. Observing 
the semi-Markov process up to time n is equivalent to observing the embedded Markov renewal process 
up to the random time N. 

Natural estimators for Pi, P2 and P3 are the empirical distributions 

1 N 1 N 1 N 

j=i j=i j=i 

where 8 X denotes the Dirac measure at a point x. 

Let O be an open subset of R d . We consider the following three models for the semi-Markov process. In 
Model Q we assume a parametric form Q = •& £ O, of the transition distribution of the embedded 



Markov chain. These models are also considered in Greenwood, Miiller and Wefelmeyer In Model R 
we assume a parametric form R = R$, ■& £ O, of the conditional distribution of the inter-arrival times. In 
Model S we assume parametric forms Q = Q$ and R = R$, 1? £ 0, for both. Of course, the last model 
covers the case that Q and R carry different parameters. We assume that Q$(x, dy) has a density q&(x, y) 
with respect to some dominating measure (i(dy), and R$(x, y, du) has a density r$(x, y, u) with respect to 
some dominating measure v(du). 

If Model Q holds, then the transition distribution of the semi-Markov process is semiparametric, S = 
Q$ x R, with R an infinite-dimensional nuisance parameter. A natural estimator of is the partial maximum 
likelihood estimator $q, which maximizes 



1 N 



Suppose that Model Q is misspecified, and that the true transition distribution of the embedded Markov 
chain is Q. Then P2[log<j , #] is an empirical version of the KL information P2[logg^]. Let Kq{P2) denote 
the parameter that maximizes Pi [log q$\ . We call Kq a KL functional. Note that the partial maximum 
likelihood estimator is the empirical version of the KL functional, $q = Kq(Pi). Since Model Q is mis- 
specified, the semi-Markov model is nonpar ametric. The empirical distribution P2 is efficient for P2 in 
a certain sense. If the KL functional is smooth, i.e. compactly differentiable in an appropriate sense, it 
follows that $q = Kq{^ 3 2) is efficient for Kq(Pi). We will not use this approach in this paper. Instead 
we derive, in Section 3, a stochastic expansion of <&q, and determine its influence function. We also show 
that the KL functional Kq is pathwise differentiable, and determine its canonical gradient. To keep the 
exposition simple, we do not give regularity conditions for these results. They can be adapted e.g. from 
those of Greenwood and Wefelmeyer [l5j]- It turns out that the canonical gradient equals the influence 
function of By the characterisation of efficient estimators in Section 2, this shows that i?q is efficient 
in the nonparametric semi-Markov model. We also show that i?q remains efficient when Model Q is true. 
The advantage of our approach is that we do not need to check compact differentiability of Kq and a 
corresponding efficiency property of P2. 

The other two models are treated analogously. If Model R holds, then the transition distribution of the 
semi-Markov process is semiparametric, S = Q <8> R$, with Q an infinite-dimensional nuisance parameter. 
A natural estimator of 1? is the partial maximum likelihood estimator 'Qr, which maximizes 

1 N 

P3 [log r^] = jrYl lo § r ® ( X J-!> X i ' u i)' 
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Suppose that Model Q is misspecified, and that the true conditional distribution of the inter-arrival times is 
R. Then P3 [log r#] is an empirical version of P3 [log r#] . Again we call the latter KL information. We denote 
by Kr(Ps) the parameter that maximizes i^flogr^], and we call Kr a KL functional. Then = Kr(Ps). 
In Section 4 we derive a stochastic expansion of and the canonical gradient of Kr and show that 
is efficient in the nonparametric semi-Markov model. We also show that 'Qr remains efficient when Model 
R is true. 

If Model S holds, then the transition distribution of the semi-Markov process is parametric, S$ = Q^®R^. 
Set 

s#(x,y,u) = qo{x,y)ro(x,y,u). 
A natural estimator of 1? is the maximum likelihood estimator #s, which maximizes 

1 N 1 N 

Papoga*] = P 2 [log^] + P 3 [logr tf ] = — ^logq^Xj-uXj) + — ^logr^X^^Xj, Uj). 

j'=i j'=i 

Suppose that Model Q is misspecified, and that the true transition distribution of the embedded Markov 
renewal process is S = Q ® R. Then P3[logs^] is an empirical version of i^logs,?]. Again we call the 
latter KL information. We denote by Ks(P3) the parameter that maximizes P3 [log s,?] , and we call K$ a 
KL functional. Then i?s = Ks(P3)- In Section 5 we derive a stochastic expansion of i?s and the canonical 
gradient of K$ and show that #s is efficient in the nonparametric semi-Markov model. We also show that 
-&S remains efficient when Model S is true. Section 6 contains some additional comments. 



2. Characterization of efficient estimators 

We assume that the embedded Markov chain is positive Harris recurrent and geometrically ergodic in 
^2(^2)- We make the usual assumption that the conditional distribution of the inter-arrival times does 
not charge zero. We also assume that the mean inter-arrival time m = EUj is finite. Then 

n/N — ► m a.s. (1) 

For a function / <G £2(^3) we have the strong law of large numbers 

1 N 

xT,f( X i-i> X i> U j)^K\f] a.s. (2) 
j'=i 

For a function / <G £2(^3) with Sf = we have the martingale central limit theorem 

N 

n -l/2^ /(X ._ 1)X . )C/ .) ^ m -l/2 ( p 3[/2]) l/2 y) (3) 
J'=l 

where 1" denotes a standard normal random variable. 

In order to characterize efficient estimators for functionals of semi-Markov models, we consider a family 
Qs, S £ A, of transition distributions of the embedded Markov chain, and a family R , 8 G A, of conditional 
distributions of the inter-arrival time. Here A is a possibly infinite-dimensional set, the parameter space. 
We fix 5 € A and set Q = Qs, R = Rs and 



V = {v G L 2 (P 2 ) : <5« = 0}, VF = {w G L 2 (P 3 ) : Rw = 0}. 
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Note that V and W can be viewed as orthogonal subspaces of L2{P%)- We assume that the parametrization 
is smooth in the following sense. There is a linear space if, the tangent space of A, and a bounded linear 
operator D = (Dq,Dr) : if — > V x W, and for each k G if there is a sequence 5 n k in A such that 
Qnk = Qs nk is Hellinger differentiable at Q with derivative DqU G V, 



dQl{ 2 -dQ 1 ' 2 - l - n -^D Q kdQ^ 2 



0, 



and R n k = Rg nk is Hellinger differentiable at R with derivative D^k G W, 



Pi 



-n 



-^DRkdR 1 ' 2 



0. 



Now write M n for the distribution of Zt, < t < n, if Q and R are in effect, and M nk if Q n fc and R n k are. 
By Taylor expansion and ([2]) and ([3]), we obtain local asymptotic normality: 



log 



dM nk 
dM n 



N 



n 



-1/2 



Y J (DQk(X j ^ u X,) + D R k(X j ^,X,,U J ))-m-\P 2 [D 2 Q k}+P 3 [D 2 R k})+ 0p (l) (4) 



and 



v 



n 



-1/2 



^ {DQkiXj^X^ + DRkiXj.uX^Uj)) 



m 



-1/2 



(P 2 [ J D^] + p 3 [ jD ^ ]) i/2y. 



(5) 



For Markov chains, different proofs are in Penev 32], Bickel 0] and Greenwood and Wefelmeyer (l3| : see 
also Bickel and Kwon [4j. For Markov step processes see Hopfner, Jacod and Ladelli [181] a nd Hdpfner 
fl6l ; 17]. A proof for nonparametric semi-Markov models is in Greenwood and Wefelmeyer [l4|. 

We want to estimate a d-dimensional functional (p : A — > R ' of the parameter <5. We call 9? differentiable 
at 5 with gradient (v<p,w<p) if i^, € G W d , and 

n 1/2 (^(<U) - <p(S)) - m-^ft^Dgfc] + ftKDjifc]), fc € if. (6) 

The canonical gradient (v*, w*) of 93 is the componentwise projection of (v<p, w^) onto the closure of (DK) d 
in (Z^-Ps))^. If DK is closed in ^(Ps), we can write (y*,w*) = (Dgk^, D^k^) for some k^ G if. This 
will be the case in Sections 3-5. 
An estimator is called regular for </? at <5 with limit L if L is a (i-dimensional random vector such that 

n 1/2 ((^ - ^(^ fc )) L under M nfc , fe G if . 
The convolution theorem says that 

L = A + m- 1 /2(p 2 [^T ] + ^[^JTjjiAy^ 

with Yd a (i-dimensional standard normal random vector, and A a d-dimensional random vector independent 
of Yd- This justifies calling (p efficient for tp at 8 if nfl 2 ((p — (p(S)) is asymptotically normal under M n with 
covariance matrix m~ 1 (P2[v^v^ r ] + Pz,[w^ w* T ]). 

An estimator is called asymptotically linear for </? at 5 with influence function (a, b) if a G 6 G T^F d , 
and 

AT 

n 1 / 2 ^ - = n-W^iaiXj^Xj) + b(Xj-\,Xj, Uj)) + o p (l). 

3=1 
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We have the following characterization. An estimator <p is regular and efficient for ip at 5 if and only if it 
is asymptotically linear with influence function equal to the canonical gradient, 

N 

n 1 ' 2 ^ - <p(8)) = n- 1 ' 2 (v^Xj-uXj) + w^Xj^Xj, Uj)) + o p (l). 

3=1 

For proofs of the convolution theorem and the characterization we refer to Bickel, Klaassen, Ritov and 
Wellner @. 

To prove asymptotic linearity of estimators in misspecified models, we need the following martingale 
approximation. Set L 2 ^(P 2 ) = {/ € L 2 (P 2 ) : -fM/] = 0}. The potential G of the embedded Markov chain 
is defined by 



<?/ = X>7, /eL 2 , (P 2 ). 

i=0 

For / € L 2 {P 2 ) set 

00 

Af(x,y) = G(f - P 2 [f])(y) - QG(f - P 2 [f])(x) = £(Q7(y) " (*))• 

j=0 

Then QAf = and 

00 

P2[(Af) 2 } = P 2 [f 2 } - (P 2 [f}) 2 + 2^>[(/ - P 2 [/])Q7]- 

1=1 

Let / € I^fPs) and set fo = f — Rf. Then we obtain the stochastic expansion 

N N 
n- 1 ' 2 " Psif]) = n" 1 ' 2 (ARfiXj-uXj) + f (Xj ;..V ; .r,]) + o p (l). (7) 

3=1 3=1 

Note that QARf = and Sfo = 0. Hence ARf(Xj^\,Xj) and /o(X,-i, X,-, f/j) are orthogonal martingale 
increments. For discrete-time processes, the martingale approximation ([7]) is due to Gordin [0] and Gordin 
and Lifsic [l(J. It was discovered independently by Maigret [27], Diirr and Goldstein H| and Greenwood 
and Wefelmeyer [l^]. See also Section 17.4 in the monograph of Meyn and Tweedie [29]. The martingale 
approximation ([7]) and the martingale central limit theorem Q imply that 

N 

n- 1 ' 2 Y^fiXj-uXj, Uj) - P 3 [f}) => m^ 2 (P 2 [(ARf) 2 } + P 3 [(f - Rf) 2 }) 1 ^. 
3=1 

To calculate canonical gradients of functionals in misspecified models, we need the following perturbation 
expansion, due to Kartashov [21U23I]. 

nV 2 (P 2nk [f]-P 2 [f])^P 2 [D Q k-Af], k£K. (8) 

Here P 2n k denotes the distribution of (Xj—x,Xj) if Q n k is in effect. This pathwise version of the perturbation 
expansion suffices for our purposes. Greenwood and Wefelmeyer [l^] show that it follows also from the 
martingale approximation (J7]). 
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3. Model Q 

In Model Q we assume a parametric model q$, i? € C R , for the //-density of the transition distribution 
of the embedded Markov chain, and consider the conditional inter-arrival time distribution as unknown. 
Suppose the model is misspecified, and the true transition distribution is Q. Then the KL functional 
Kq(P 2 ) maximizes P 2 \logq^], and the partial maximum likelihood estimator i?q maximizes P2 [log <7$] . 
Write 

for the d-dimensional vector of partial derivatives of log q&(x,y). Then Kq(P 2 ) solves ^[Xtf] = 0) an d $Q 
solves P2[Xi?] = 0- Heuristically, by Taylor expansion, 

1 - 

= P 2 [ X ^] = -^ Xl 5 Q (^-i,^) 
i=l 

N N 

= JrY,XK«w( X i-i> X i) + jvE^(ft)(^-i.^)(^ " K Q (P 2 )) + o p (n-^). (9) 
3=1 3=1 

Here x${x, y) is the d X d matrix of partial derivatives of Xtf( x i u)- With ([1]) and ([2]) we obtain 

JV 

n 1 / 2 ^ - K Q (P 2 )) = -m{P 2 [xK Q {p 2 )])- l n- 1 ' 2 Y^XK Q{ p 2 ){X^ 1 ,X j ) + 0p (l). (10) 

i=i 

If Model Q is correctly specified and Q = then Kq(P 2 ) = *&. We also have the following relations, 
which are well-known in the i.i.d. case, 

= d#Q#(-,E) = QoXd, = d#Q#X0 = QvXvxJ + QvXo- 

In particular, the partial Fisher information matrix for Model Q is 1$ = — P 2 {xtf] = -fMxtfXj]- Hence, for 
the correctly specified model, the partial maximum likelihood estimator t?g has the stochastic expansion 

N 

n V\$ Q -$) = ml^n^l^X^X^X,) + o p (l). 

3=1 

This means that $q is asymptotically linear with influence function mI7 (x#,0), and n 1//2 (i?Q — ■&) is 
asymptotically normal with covariance matrix ml^ 1 . 

If the model is misspecified, then Xk q (p 2 ) is n °t i n V d - We apply the martingale approximation ([7]) 
to ([10}) and see that i?q is asymptotically linear with influence function — m{P 2 [xK Q (P 2 )Y) \A.Xk q {p 2 )-> 0)- 
Hence n 1 / 2 (i?Q — Kq(P 2 )) is asymptotically normal with covariance matrix 

m (P2 [xk q (p 2 ) ] ) ~ 1 P2 [Axk q (p 2 ) A 1 xk q (p 2 ) ] ( Pi [xk q (p 2 ) } ) " 1 • 

Let us now prove efficiency of $q, first for the correctly specified model. For c € R d set i5 nc = •& + n~ 1 l 2 c. 
Assume that q nc = q$ nc is Hellinger differentiable at 

J f (<tt?(x,v) ~ <& /2 (x,y) - ^n- 1 / 2 c r x^x,y)qi /2 (x,y)y f x(dy)P 1 (dx) 0. (11) 
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Let 1Z denote the set of all conditional inter-arrival distributions. For w G W choose a sequence R nw in 1Z 
that is Hellinger differentiable at R, 



P? 



dR$ - dR 1 ' 2 -^n-^wdR 1 ^ 2 



0. 



(12) 



Then the assumptions of Section 2 hold with A = 6 x 1Z, K = R d x W, Dq(c, w) = c T x$, Dr(c, w) = w. 
The functional to be estimated is (p($, R) = By orthogonality of V and W, its canonical gradient is 
obtained from © as (cjxtf; 0) with d x d matrix c# determined by 



c = m P 2 [xtiXd}c = m c^I^c, c 6 R, 

i.e. c$ = mlfi 1 . Hence the canonical gradient of $ is Tnl^ } 1 (x^,0) and equals the influence function of 
which is therefore efficient for the correctly specified model. 

Suppose now that the model is misspecified, and let Q be the set of all transition distributions of the 
embedded Markov chain. Let Q denote the true transition distribution. For v G V choose a sequence Q nv 
in Q that is Hellinger differentiable at Q, 



dQll: - dQ 



1/2 



^n-^vdQ 1 ^ 2 



0. 



(13) 



Then the assumptions of Section 2 hold with A = Q x 7Z, K = V x W, Dq(v, w) = v, Dr(v, w) = w. The 
functional to be estimated is p(Q,R) = Kq(P 2 ). Heuristically, 



= P2nv[XK Q (P 2nv )] = P2nv[XK Q (P 2 )] + Plnv [XK q (P 2 )\ {Kq (^nu) ~ K Q {P 2 )) + O p {n 1/2 ). 

With P2nv[XK Q (P 2 )} -> P2[xk q (p 2 )] we obtain 

K Q (P 2 nv) - K Q {P 2 ) = -(P2[XK Q (P 2 )]r 1 P2nv[XK Q (P 2 )} + O^n' 1 ' 2 ). 

The perturbation expansion ([8]) yields 

n l/2 P2nv[XK Q {P 2 )] = n^ 2 (P 2 nv ~ P 2 )[XiC (P 2 )] -» ^Mx* (J*)]. (14) 

Hence 

n l / 2 (K Q (P 2nv ) - K Q {P 2 )) - -(P 2 [ XKQ{ p 2) \r l P 2 [vA XKQ{P2) l v G V, 

and the canonical gradient of Kq is obtained from © as —m(P 2 [xK Q (P 2 )})^ 1 (^Xk q (p 2 ) i ^) an d equals the 
influence function of i?q, which is therefore efficient for the misspecified model. 



4. Model R 

Model R is completely analogous to Model Q, with interchanged roles of the transition distribution Q of 
the embedded Markov chain, and the conditional inter-arrival time distribution R. Specifically, in Model 
R we assume a parametric model r$, # G © C R d , for the ^-density of the conditional inter-arrival time, 
and consider the transition distribution of the embedded Markov chain as unknown. Suppose the model is 
misspecified, and the true conditional inter-arrival time distribution is R. Then the KL functional Kr(P%) 
maximizes P3[logr,#], and the partial maximum likelihood estimator i9q maximizes Psflogr^]. Write 



Qti(x,y,u) = d$ log r#(x,y,u) 
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for the ti-dimensional vector of partial derivatives of logr&(x,y,u). Then Kr(Ps) solves = 0, and 

•&H solves Pafgtf] = 0. Heuristically, by Taylor expansion, 

1 - 

N N 

= nH QK»lP a )(Xj-i,Xj, Uj) + -Y1 Bk^X^X^U^r - K R (P 3 )) + OpCn" 1 ^). (15) 
Here Q$(x,y,u) is the d x d matrix of partial derivatives of Q$(x,y,u). With (pQ) and ([2]) we obtain 

AT 

n 1 ^ - ^(P 3 )) = -m{Pz[e KR{P3) ])- l n- 1 / 2 £ X,-, U,) + o p (l). (16) 

i=i 

If Model R is correctly specified and R = R$, then Kr{P^) = ft. We also have the following relations, 

= d#R#(-, ■, R) = R$q$, = d$R$Qtf = R$q$qI + RfiQd- 

In particular, the partial Fisher information matrix for Model R is J# = — i-af^] = Pz[q$qI\- Hence, for 
the correctly specified model, the partial maximum likelihood estimator iHr has the stochastic expansion 

N 

n V\# R -{}) = mJ^n- 1 ' 2 Q^Xj-uXj, Uj) + op(l). 

i=i 

This means that $r is asymptotically linear with influence function mJ^" 1 (0, q$), and n l l 2 (dp — •&) is 
asymptotically normal with covariance matrix mJ^ 1 . 

If the model is misspecified, then Qk r (p 3 ) is not in W d . We apply the martingale approximation ([7]) to 
(|16p and see that is asymptotically linear with influence function 

-m(P^[Q KR{ p 3) ])~ 1 (ARQ KR{ p 3) , q Kr (p 3 ) ~ Rqk r {p 3 ))- 

Hence n 1//2 (^R — Kp{P^)) is asymptotically normal with covariance matrix 

m{Pz[QK R {P 3 )]Y l ^R{Pz{QK R {P 3 )\Y l , 

where 

S fi = P 2 [ARq Kr (p 3) A t Rg KR( p 3) ] + ^[(^(ft) - r Qk r {p 3 ))(Qk r {p 3 ) ~ R0k r (p 3 )) T }- 

Let us now prove efficiency of "dp, first for the correctly specified model. For c € R d set # nc = 1? + n~ l l 2 c. 
Assume that r nc = r$ nc is Hellinger differentiable at 

r nc 2 ( x ,y, u ) ~ r l /2 (x,y,u) - ^n~ 1/2 c T e$(x,y,u)rl /2 (x,y,u)^ u(du)P 2 (d(x,y)) -> 0. (17) 

Let Q denote the set of all transition distributions of the embedded Markov chain. For v € V choose a 
sequence Q nv in Q that is Hellinger differentiable (fT3j) at Q. Then the assumptions of Section 2 hold with 
A = Q x 6, K = V x R d , Dq(v,c) = v, Dp(v,c) = c t q$. The functional to be estimated is <p(Q,d) = 
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By orthogonality of V and W, its canonical gradient is obtained from (|6|) as (0, cj g^) with d x d matrix 
c$ determined by 

c = m~ l clj$c, c G R, 

i.e. c# = mJfl 1 . Hence the canonical gradient of is mJ^" 1 (0, g&) and equals the influence function of 
which is therefore efficient for the correctly specified model. 

Suppose now that the model is misspecified, and let 1Z be the set of all transition distributions of the 
embedded Markov chain. Let R denote the true transition distribution. For w GW choose a sequence R nw 
in 1Z that is Hellinger differentiable (|12j) at R. Then the assumptions of Section 2 hold with A = Q x 1Z, 
K = V x W, Dq(v,w) = v, D R (v,w) = w. The functional to be estimated is (p(Q,R) = K R (P 3 ). 
Heuristically, 

= P3nvw[QK R (P 3nvlll )] = P3nvw[QK R (P 3 )] + Pznvw [QKr(P 3 )] {K R (P 3nvw ) ~ K R (P 3 )) + O p (n 1 ^ 2 ). 

With P3nvw[QK R (p 3 )} -> P3[qk r (p 3 )] we obtain 

K R (P 3n vw) ~ K R (P 3 ) = -{P3{QK R {P 3 )}T 1 P3nvw[QK R (P 3 )\ + O p (n~ 1/2 ). 

Write P^nvw = Pinv ® Rnw an d apply the perturbation expansion (|14p to obtain 

n^ 2 (K R (P 3nvw )-K R (P 3 )) -> -(P 3 fe fi (P3)]) _1 (^M^(P3)] +P 3 [^^(P3)]) 

= -(-P3[te n (P 3 )]) _1 (-P2M J R^(P 3 )] + P 3 [u>(te„(P 3 ) - ^te R (P 3 ))])> 

and the canonical gradient of K R is obtained from ([6]) as 

-m(P 3 fe Ji (P3)])~ 1 (^te R (P 3 )>^ B (P3) - RQk r {p 3 )) 
and equals the influence function of which is therefore efficient for the misspecified model. 



5. Model S 

While Models Q and R are semiparametric, Models S is parametric. In Model S we assume parametric 
models q$ and r#, $ £ G C R d , for the //-density of the transition distribution of the embedded Markov 
chain and for the i^-density of the conditional inter-arrival time. We have s$(x,y,u) = q$(x,y)r$(x,y,u). 
Hence the KL functional Ks(P 3 ) maximizes P 3 [log s#] = P2 [log q&] + -P 3 [logr.#], and the partial maximum 
likelihood estimator $5 maximizes P 3 [logs^] = P 2 [logg$] + P 3 [logr#]. Write 

a#(x, y, u) = 8$ log s#(x, y, u) = x#{x, y) + Q#(x, y, u) 

for the d-dimensional vector of partial derivatives of log s$(x, y, u). Then Ks(Ps) solves -P 3 [ct,j] = i^tXtf] + 
^[^i?] = 0) an d $s solves P 3 [<7#] = P2[x#] + Ps[ft?] = 0. Taylor expansions analogous to (JOj) and (jT5j) imply 

1 - 

N N 

= NJ2 a ^(Ps)(^j-i,X j ,U j ) + -J2*K s{ P 3 )(X j _ 1 ,X j ,U j )(#s-Ks(P 3 )) + o p (n- 1 / 2 l 



10 Ursula U. Miiller, Anton Schick and Wolfgang Wefelmeyer 

where &&(x, y, u) = x$(x, y) + Q&(x, y, u) is the d x d matrix of partial derivatives of a$(x, y, u). We obtain 

N 

n 1 / 2 (Ss-K s (P 3 )) = -m(P 3 [a Ksi p 3) ])-\- 1 / 2 ^a Ks ^^ (18) 

If Model S is correctly specified with Q = Q$ and R = R$, then Ks(Ps) = From Sections 3 and 4 we 
obtain the Fisher information matrix for Model S as 1$ + J&. Hence, for the correctly specified model, the 
maximum likelihood estimator $g has the stochastic expansion 

N 

n 1/2 (S s - 1?) = m{I# + J^-'n- 1 / 2 a^X^Xj, U 3 ) + o p (l). 

j=i 

This means that $s is asymptotically linear with influence function m(I# + J-d)~ l (x-di Q-o), an d n 1 / 2 (i?5 — 1?) 
is asymptotically normal with covariance matrix m(I& + ■ 

If the model is misspecified, then Xk s (p 3 ) ^ s n °t i n V d an d Qk s (p 3 ) * s n °t i n W d . We apply the martingale 
approximation (J7J) to (|18|) and see that $5 is asymptotically linear with influence function 

-m(P 3 [CT A ' s( p 3) ])^ 1 (Ax^ s (P 3 ) + Ar Qk s (P 3 ),Qk s (P 3 ) ~ RQk s (p 3 ))- 
Hence n 1//2 (i?5 — Ks(Ps)) is asymptotically normal with covariance matrix 

where 

S s = P 2 [A(x Ks (P 3 ) + R8k s (p 3 ))A t (xk s (p 3 ) + Rqk s (p 3 ))] + P3[(qk s (p 3 ) ~ Rqk s (p 3 ))(sk s (p 3 ) ~ Rqk s (p 3 )) T ]- 

Let us now prove efficiency of $5, first for the correctly specified model. For c € R d set i? nc = $ + n~ 1 / 2 c. 
Assume that q nc = q$ na is Hellinger differentiable ([IT]) at #, and r nc = r^ nc is Hellinger differentiable (fT7j) at 
■d. Then the assumptions of Section 2 hold with A = Q, K = R d , Dqc = c T Xtf, Drc = c T g$. The functional 
to be estimated is (p{&) = The canonical gradient is obtained from ([6]) as m{I$ + ft?)- It equals 

the influence function of $5, which is therefore efficient in the correctly specified model. 

Suppose now that the model is misspecified. Let Q be the set of all transition distributions of the 
embedded Markov chain, and let 1Z be the set of all transition distributions of the embedded Markov 
chain. For v 6 V choose a sequence Q nv in Q that is Hellinger differentiable (|13p at Q. For w € W 
choose a sequence R nw in 7Z that is Hellinger differentiable (|12p at i?. Then the assumptions of Section 
2 hold with A = Q x 1Z, K = V x W, Dq(v,w) = v, Dr(v,w) = w. The functional to be estimated is 
ip(Q,R) = Ks(Ps)- Similarly as in Section 4, 

= P3nvw[QK s (P 3 „v W )] = P3nvw[QK s (P 3 )] + P3nvw[QK s (P 3 )]( K s(P3nvw) ~ -Ks(P 3 )) + O p (n -1 / 2 ), 
K S (P3nv W ) ~ K S (P 3 ) = -(P 3 [^ s (P3)])" 1 ^3n™[^ s (P 3 )] +Op(n- 1/2 ), 

and therefore 

Tl^iKsiPsnv^-KRiPz)) -» -(P3fe4P 3 )]) _1 (i 3 2b(^ s (P3)+^to S (P3))]+ P 3[w(to s (P3)-^te S (P3))])- 

Hence by © the canonical gradient of K$ is obtained as 



-m(P 3 [a Ks{ p 3) }) (Axk s (p 3 ) + ^te s (P 3 ), Qk s (p 3 ) ~ RQk s (p 3 )) 
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and equals the influence function of which is therefore efficient for the misspecified model. 
6. Remarks 

In this section we comment on examples and possible extensions of our results. 

1. If the distribution of the inter-arrival times charges only 1, so that R(x,y,du) = 5i(du), then the 
semi-Markov process reduces to a Markov chain with transition distribution Q, and for Model Q we recover 
the results of Greenwood and Wefelmeyer [l5| . 

2. Our results carry over to observations (Xq, Tq), . . . , (X n ,T n ) of the embedded Markov renewal process. 
Just replace N by n. In particular, instead of the central limit theorem (j3J) with random summation index 
N, use 



and replace m by 1 everywhere. 



In some examples we can describe the KL functional more explicitly. 

3. Suppose the embedded Markov chain is a linear autoregressive model of order 1, i.e. Xj = fiXj-i+Ej, 
where i? G IR and the innovations £j are i.i.d. with mean 0, finite variance, and known density /. Then 
Model Q holds with Q(x,dy) = f(y — $x)dy, and Xtf(xiU) = x£(y — fix) with t = —/'//. Hence the KL 
functional solves E\Xq(,{X\ — i?Xq)] = 0. If / is the density of tY for some r > 0, then £{x) = t~ 2 x and 
E[X i(Xi - $X )] = r- 2 (S[X Xi] - &E[X$]). Hence the KL functional is K Q (P 2 ) = E[XqXi)/ E[X$\, 
and the partial maximum likelihood estimator for -d is the least squares estimator 



2^j=l X j-1 



a ratio of two empirical estimators. 

4. Suppose the inter-arrival time Uj given Xj—i = x and Xj = y is exponentially distributed with mean 
1/X(x) not depending on y, 

R(x,y,du) = X(x) exp(— u\(x))du. 

Then the semi-Markov process is a Markov step process. If the mean is constant, X(x) = i?, i? > 0, then 
Model R holds with R$(x,y,du) = $ex.p($u), and Q$(x,y,u) = — u. Hence the KL functional solves 
E[q#(X , Xi, C/i)] = fi- 1 - E[U{\ = 0, and we obtain K R (P 3 ) = l/£[£/i]. The partial maximum likelihood 
estimator for is 

N 



i=i 

a function of an empirical estimator. Efficiency of empirical estimators in Markov step processes is studied 
in Greenwood and Wefelmeyer 



The models Q, R and S are described in terms of the conditional distributions Q(x,dy) and R(x,y,du). 
It is occasionally reasonable to model instead the marginal distributions P\, P 2 or P3. Results for these 
three models differ considerably among each other and from Models Q, R and S. 
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5. Suppose we have a parametric model for the //-density pi$ of P±. The marginal maximum likelihood 
estimator $1 maximizes 

1 N 

Piflogpitf] = — J^logp ltf (X,-_i). 
j=i 

It estimates the iTL functional K(P\), the parameter that maximizes Pi[logpi#]. Note that the marginal 
maximum likelihood estimator is an empirical version of the KL functional, $\ = K(Pi). 

However, i?i is not efficient for i? when the marginal model is correctly specified. The reason is that 
the specification px$ of the marginal density implies a constraint on the conditional distribution Q of 
the embedded Markov chain, but the marginal maximum likelihood estimator does not use this informa- 
tion. An efficient estimator for i? is difficult to construct. See Kessler, Schick and Wefelmeyer [24j] for an 
efficient estimator of i? in a Markov chain model with a (correctly specified) parametric model for the 
(one-dimensional) marginal density. On the other hand, $i is efficient for K{P\) in a nonparametric sense 
when the marginal model is misspecified. 

We note that, in this respect, semi-Markov processes and Markov chains are different from the i.i.d. 
case. Suppose we have i.i.d. observations (Xj,Yj) with joint distribution p\^{x)dx Q(x, dy), where Q is 
unknown. Then Q is not constrained by the marginal model pi$, and the marginal maximum likelihood 
estimator is efficient for if the marginal model is correctly specified, and also efficient for K(P\) if the 
marginal model is misspecified. 

6. Suppose we have a parametric model for the /x 2 -density p2$ of i-*2- The marginal maximum likelihood 
estimator $2 maximizes 

1 N 

P2[logp 2 tf] = — ^logPzfiiXj-^Xj). 

It estimates the KL functional K(P2), the parameter that maximizes P2 [log p2$] , and $2 = K(P2)- The 
perturbation expansion (jHJ) suggests that maximizing P2[logj>2i?] is asymptotically equivalent to solving 
P2L4.X1?] = 0, and the martingale approximation © suggests that this is asymptotically equivalent to 
solving F^IXtf] = 0- Hence the marginal maximum likelihood estimator $2 is asymptotically equivalent to 
the conditional maximum likelihood estimator i9q and therefore efficient in the correctly specified model 
p 2 i9- The reason is that P2d(x,y) = Pw(x)q^(x,y), and q$(x,y) determines pi$, which therefore does not 
contain additional information about 1?. 

This is again different from the i.i.d. case. Suppose we have i.i.d. observations (Xj,Yj) with joint density 
Pw(x)q^(x,y). Then px$ contains, in general, additional information about 1?. 

7. Suppose we have a parametric model for the fj 2 <8> ^-density p^ of P3. The marginal maximum likelihood 
estimator $3 maximizes 

1 N 

i=i 

It estimates the KL functional K(P^), the parameter that maximizes P^ogp^], and $3 = K(P^). We can 
write P3$(x,y,u) = P2^(x,y)r^(x,y,u). Now r$(x,y,u) carries additional information about i9, similarly 
as in the i.i.d. case. 

8. Remark 5 tells us in particular the following, rather obvious, fact. If a parametric estimator is efficient 
in a nonparametric sense, then the reason is not that it is efficient in a parametric model. Rather, an 
estimator usually is nonparametrically efficient because it is a smooth function of an empirical estimator. 
We can illustrate this also with Model S. Suppose we have parametric models q$ and r$ for the densities of 
Q and R. Let tin = Kq(P2) be the conditional maximum likelihood estimator based on the model q$ alone. 
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In general, $q will not be efficient for i? when model S is correctly specified, because i9q does not use the 
information about i? in the model r,#. But if both q$ and r,# are misspecified, ??q will be nonparametrically 
efficient for Kq{P2), which is the KL functional for Model Q but not for Model S. 



References 



Andrews, D. W. K. and Pollard, D., 1994, An introduction to functional central limit theorems for dependent stochastic processes. 
Internal. Statist. Rev. 62, 119-132. 

Bickel, P. J., 1993, Estimation in semiparametric models. In: (C. R. Rao, Ed.) Multivariate Analysis: Future Directions (Amster- 
dam: North-Holland), pp. 55-73 MR1246354 

Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and Wellner, J. A., 1998, Efficient and Adaptive Estimation for Semiparametric Models 
(New York: Springer). MR162 35591 

Bickel, P. J. and Kwon, J., 2001, Inference for semiparametric models: Some questions and an answer (with discussion). Statist. 
Sinica 11, 863-960. MR1867326 

Dahlhaus, R. and Wefelmeyer, W., 1996, Asymptotically optimal estimation in misspecified time series models. Ann. Statist. 24, 
952-974. IMR14018321 

Daniels, H. E., 1961, The asymptotic efficiency of a maximum likelihood estimator. Proc. Fourth Berkeley Sympos. Math. Statist, 
and Probability 1, 151-163. [MR0131924I 

Doksum, K., Ozeki, A., Kim, J. and Neto, E. C, 2007, Thinking outside the box: Statistical inference based on Kullback-Leibler 
empirical projections. Statist. Probab. Lett. 77, 1201-1213 

Diirr, D. and Goldstein, S., 1986, Remarks on the central limit theorem for weakly dependent random variables. In: (S. Albeverio, 
P. Blanchard and L. Streit, Eds.) Stochastic Processes — Mathematics and Physics, Lecture Notes in Mathematics 1158 (Berlin: 
Springer), pp. 104-118 MR0838560 

Gordin, M. I., 1969, The central limit theorem for stationary processes. Soviet Math. Dokl. 10, 1174-1176. MR0251785 
Gordin, M. I. and Lifsic, B. A. 1978, The central limit theorem for stationary Markov processes. Soviet Math. Dokl. 19, 392—394. 
Greenwood, P. E., Miiller, U. U. and Wefelmeyer, W., 2004, Efficient estimation for semiparametric semi-Markov processes. Comm. 
Statist. Theory Methods 33, 419-435. MR2056947 

Greenwood, P. E. and Wefelmeyer, W., 1994, Nonparametric estimators for Markov step processes. Stochastic Process. Appl. 52, 
1-16. MR1289165 

Greenwood, P. E. and Wefelmeyer, W., 1995, Efficiency of empirical estimators for Markov chains, Ann. Statist., 23, 132—143. 
IMR133 1660 

Greenwood, P. E. and Wefelmeyer, W., 1996, Empirical estimators for semi-Markov processes. Math. Meth. Statist. 5, 299-315. 
IMR14176741 

Greenwood, P. E. and Wefelmeyer, W., 1997, Maximum likelihood estimator and Kullback-Leibler information in misspecified 
Markov chain models. Theory Probab. Appl. 42, 103-111. IMR1453336I 

Hopfner, R. , 1993a, On statistics of Markov step processes: representation of log-likelihood ratio processes in filtered local models. 
Probab. Theory Related Fields 94, 375-398. MRTT986531 

Hopfner, R., 1993b, Asymptotic inference for Markov step processes: observation up to a random time. Stochastic Process. Appl. 
48, 295-310. IMR12445471 

Hopfner, R., Jacod, J. and Ladelli, L., 1990, Local asymptotic normality and mixed normality for Markov statistical models. 
Probab. Theory Related Fields 86, 105-129. IMR10619511 

Hosoya, Y., 1989, The bracketing condition for limit theorems on stationary linear processes. Ann. Statist. 17, 401-418. MR0981458 
Huber, P. J., 1967, The behavior of maximum likelihood estimates under nonstandard conditions. Proc. Fifth Berkeley Sympos. 
Math. Statist, and Probability 1, 221-233. [MR0216620I 

Kartashov, N. V., 1985a, Criteria for uniform ergodicity and strong stability of Markov chains with a common phase space. Theory 
Probab. Math. Statist. 30, 71-89. 

Kartashov, N. V., 1985b, Inequalities in theorems of ergodicity and stability for Markov chains with common phase space. I. Theory 
Probab. Appl. 30, 247-259. 

Kartashov, N. V., 1996, Strong Stable Markov Chains (Utrecht: VSP1. [MR1451375I 

Kessler, M., Schick, A. and Wefelmeyer, W., 2001, The information in the marginal law of a Markov chain. Bernoulli 7, 243-266. 
MR1828505 

Kutoyants, Yu. A., 1988, On an identification problem for dynamical systems with small noise. Izv. Akad. Nauk Armyan. SSR 23, 
270-285. MR0976484 

Kutoyants, Yu. A., 2004, Statistical Inference for Ergodic Diffusion Processes, Springer Series in Statistics (London: Springer). 
MR2144185 

Maigret, N., 1978, Theoreme de limite centrale fonctionnel pour une chaine de Markov recurrente au sens de Harris et positive. 
Ann. Inst. H. Poincare Probab. Statist. 14, 425-440. i MR052322ll 

McKeague, I. W., 1984, Estimation for diffusion processes under misspecified models. J. Appl. Probab. 21, 511-520. MR0752016 
Meyn, S. P. and Tweedie, R. L., 1993, Markov Chains and Stochastic Stability (London: Springer). IMR1287609I 
Miiller, U. U., 2007, Weighted least squares estimators in possibly misspecified nonlinear regression. Metrika 66, 39—59. MR2306376 
Ogata, Y., 1980, Maximum likelihood estimates of incorrect Markov models for time series and the derivation of AIC. J. Appl. 
Probab. 17, 59-72. MR0557435 

Penev, S., 1991, Efficient estimation of the stationary distribution for exponentially ergodic Markov chains. J. Statist. Plann. 
Inference 27, 105-123. IMR10893561 

Pollard, D., 1985, New ways to prove central limit theorems. Econometric Theory 1, 295—314. 

Sin, C.-Y. and White, H., 1996, Information criteria for selecting possibly misspecified parametric models. J. Econometrics 71, 
207-225. MR13 8T0821 

White, H., 1982, Maximum likelihood estimation of misspecified models. Econometrica 50, 1-25. MR0640163 

White, H., 1984, Maximum likelihood estimation of misspecified dynamic models. In: T. K. Dijkstra (Ed) Misspecification Analysis, 
Lecture Notes in Economics and Mathematical Systems 237 (Berlin: Springer), pp. 1—19. MR0791952 
[37] White, H., 1994, Estimation, Inference and Specification Analysis, Econometric Society Monographs 22 (Cambridge: Cambridge 
University Press). MR1292251 



